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ABSTRACT 

Motivated  by  the  inability  of  current  drug  treatment  to  provide  long-term  benefit  to  HIV- 
infected  individuals,  we  derive  HIV  therapeutic  strategies  by  formulating  and  analyzing  a 
mathematical  control  problem.  The  model  tracks  the  dynamics  of  uninfected  and  infected 
CD4+  cells  and  free  virus,  and  allows  the  virus  to  mutate  into  various  strains.  At  each  point 
in  time,  several  different  therapeutic  options  are  available,  where  each  option  corresponds 
to  a  combination  of  reverse  transcriptase  inhibitors.  The  controller  observes  the  individual's 
current  status  and  chooses  among  the  therapeutic  options  in  a  dynamic  fashion  in  order  to 
minimize"  the  total  viral  load.  Our  initial  numerical  results  suggest  that  dynamic  therapies 
have  the  potential  to  significantly  outperform  the  static  protocols  that  are  currently  in  use; 
by  anticipating  and  responding  to  the  disease  progression,  the  dynamic  strategy  reduces  the 
total  free  virus,  increases  the  uninfected  CD4+  count,  and  delays  the  emergence  of  drug- 
resistant  strains. 
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1.  Introduction 

Optimal  treatment  of  the  human  immunodeficiency  virus  of  type  1  (HIV)  infection  is 
the  subject  of  intense  research  activity.  Rapid  progress  has  been  made  in  the  development 
and  testing  of  anti-HIV  therapeutic  agents,  resulting  in  the  approval  by  the  Food  and  Drug 
Administration  of  five  reverse  transcriptase  (RT)  inhibitors  (AZT.  ddl,  ddC,  d4T,  3TC)  and 
one  protease  inhibitor  (saquinavir).  These  potent  drugs  inhibit  viral  replication  and  lead  to 
a  rapid  decline  in  viral  abundance,  often  within  days  after  starting  therapy.  Unfortunately, 
these  drugs  have  had  only  limited  success  in  delaying  the  onset  of  AIDS:  continual  viral 
replication  of  HIV,  together  with  the  high  error  rate  of  reverse  transcription  of  viral  RNA  into 
DNA.  leads  to  the  emergence  of  drug-resistant  virus  strains,  typically  within  weeks  or  months 
(depending  upon  the  drug)  of  treatment  initiation  (McLeod  &  Hammer.  199*2;  Lagakos  et 
aL.  1993;  Volberding  ef  aL.  1994:  Wei  et  a/..  1995;  Schuurman  et  aL.  1995).  Moreover, 
multidrug  resistance  has  been  documented  in  combinations  of  RT  inhibitors  (Larder  et  aL. 
1993;  Richman  et  ah.  1994;  Shafer  et  aL.  1994,  1995;  Kojima  et  ai.,  1995)  and  combinations 
of  protease  inhibitors  (Condra  et  aL.  1995).  although  Larder  et  a!.  (1995)  have  recently 
shown  that  the  AZT-3TC  combination  is  able  to  sustain  in  vivo  antiviral  effects  for  at  least 
24  weeks. 

It  would  appear  that  increased  effectiveness  of  HIV  therapy  could  be  achieved  by 
developing  dynamic  multidrug  approaches,  where  the  combination  of  drugs  received  by  a 
patient  changes  over  time  in  response  to  the  disease  progression  (e.g.,  current  C4+  count, 
viral  load,  mix  of  viral  strains).  In  this  paper  we  mathematically  model  the  dynamic  multiple 
drug  therapy  problem  as  an  optimal  control  problem  that  can  be  informally  described  as 
follows:  choose  an  optimal  mix  of  drugs  at  each  point  in  time  in  order  to  minimize  the  total 
viral  load  over  some  time  horizon  (e.g..  one  year)  subject  to  a  toxicity  constraint  on  the  drug 
regimen. 


Several  mathematical  models  have  been  developed  that  incorporate  the  effects  of  ther- 
apy on  an  HIV-infected  individuals.  In  a  series  of  papers  (Perelson,  1989;  Perelson  et  al., 
1993;  Kirschner  &  Perelson,  1994;  Kirschner  &  Webb,  1995a;  Kirschner  et  al,  1995),  Perelson 
and  Kirschner  and  their  colleagues  have  studied  the  timing,  frequency  and  intensity  of  AZT 
treatment.  Agur  (1989)  focuses  on  the  optimal  tradeoff  between  the  toxicity  and  efficacy  of 
AZT.  The  mathematical  models  considered  in  these  studies  do  not  allow  for  virus  mutation. 
Nowak  et  al.  (1901),  McLean  t  Nowak  (1992),  Nowak  k  May  (1993),  Frost  k  McLean  (1994) 
and  Kirschner  &  Webb  (1995b)  analyze  descriptive  (as  opposed  to  optimization)  models  for 
the  competitive  interaction  of  AZT-sensitive  and  AZT-resistant  strains  of  HIV;  the  latter 
three  papers  also  include  numerical  simulations  of  alternating  and/or  combination  therapies. 

Our  deterministic  control  problem,  which  is  described  in  Section  2,  considers  a  finite 
number  of  virus  strains,  or  quasispecies,  and  allows  mutations  from  one  strain  to  another. 
A  finite  number  of  therapeutic  options  are  allowed,  where  each  option  consists  of  the  si- 
multaneous application  of  one  or  more  RT  inhibitors.  The  model  incorporates  uninfected 
CD4+  T-cells,  and  infected  CD4+  cells  and  infectious  free  virus  associated  with  each  virus 
strain;  in  this  context,  the  RT  inhibitors  prevent  the  free  virus  from  successfully  infecting 
uninfected  CD4+  cells.  Each  drug  option  has  a  different  efficacy  against  each  virus  strain, 
thereby  allowing  for  complex  drug-virus  interactions. 

Because  of  the  high  dimensionality  of  the  control  problem,  we  resort  to  approximation 
methods  in  Section  3;  more  specifically,  perturbation  methods  and  the  policy  improvement 
algorithm  are  used  to  derive  a  closed  form  dynamic  policy.  Several  other  types  of  therapeutic 
approaches,  such  as  protease  inhibitors  and  the  reconstitution  of  the  immune  system,  are 
considered  in  Section  4.  Results  from  numerical  simulations  are  reported  in  Section  5  and 
discussed  in  Section  6.  Concluding  remarks  can  be  found  in  Section  7. 


2.   Problem  Formulation 

Our  mathematical  model  incorporates  /  different  strains  of  HIV.  For  i  =  1. . . . ,  /,  let 
yi(t)  be  the  density  of  CD4+  cells  infected  by  strain  i  at  time  r,  and  let  u,-(£)  denote  the 
density  of  infectious  free  virus  of  strain  i;  noninfectious  virions  are  ignored  in  our  model. 
If  we  define  x(t)  to  be  the  density  of  uninfected  CD4+  cells  at  time  t,  then  the  state  of 
the  system  at  time  t  is  given  by  (x(t),  yi(t), . . . ,  yi(t),  V\{t), . . . ,  vj(t)),  which  is  denoted  by 
(x(t).yi(t).vt(t)). 

The  controller  has  J  therapeutic  options  at  time  t,  where  each  option  corresponds  to 
a  combination  of  RT  inhibitors.  For  example,  j  =  1  might  represent  AZT  used  with  ddl 
and  combination  j '  =  2  might  be  AZT.  ddC  and  d4T.  Our  control  variables  are  dj(t).  which 
is  the  amount  of  option  j  to  use  at  time  t.  We  assume  that  each  virus  strain  has  its  own 
infectivity  rate,  denoted  by  3,.  which  is  the  rate  that  it  infects  uninfected  CD4+  cells;  the 
reason  for  the  "tilde"  will  become  clear  in  the  next  section.  The  RT  inhibitors  reduce  virus 
infectivity  in  the  following  manner.  For  i  =  1....,/  and  j  =  1,...,J,  let  p3i  denote  the 
efficacy  of  drug  combination  j  in  blocking  new  infections  by  virus  strain  i.  Under  a  generic 
drug  policy  dj(t),  the  infectivity  of  virus  i  is  $i[l  —  J2j  Pj>dj{t)]:  we  assume  that  the  values 
of  pji  and  the  toxicity  parameters  in  equation  (4)  are  chosen  so  that  the  infectivity  of  each 
strain  is  nonnegative  under  all  feasible  therapeutic  strategies. 

The  dynamics  of  our  system  are  described  by  the  following  set  of  ordinary  differential 
equations: 

x(t)  =  A  -  (f,  +  £  ft«j(0[l  -  X>*di(*)])*(*).  (1) 


>=1  .7  =  1 

J 


m  =  (E  9wW*)[i  -  E/wM')Lk>)  -  o.-y.-(*),  (2) 

fc=l  3=1 

bi{t)  =  iriyi(t)-[ki  +  Pix(t)]vi(t).  (3) 

The  rate  at  which  uninfected  CD4+  cells  are  invaded  by  virus  strain  i  at  time  t  is  $tvl(t).r(t ). 


and  each  of  these  potential  infections  leads  to  a  reduction  in  free  virus.  The  rate  of  successful 
infections  by  strain  i  is  /?,-[l  — Z)j=]  Pjidj(t)]vi(t)x(t),  and  these  infections  cause  a  simultaneous 
decline  in  uninfected  cells  x(t)  and  rise  in  infected  cells  y,(i).  The  mutation  rate  gtj  in  (2) 
is  the  fraction  of  reverse  transcriptions  of  strain  i  that  result  in  a  cell  infected  by  strain 
j.  Hence,  J2j=i  <7«j  =  1  and  the  diagonal  terms  of  the  mutation  matrix  are  close  to  one  in 
value,  while  the  off-diagonal  terms  are  nearly  zero.  Strain  i  replicates  at  rate  7r,  after  it  has 
infected  a  CD4+  cell,  and  thus  free  virus  of  strain  i  is  produced  at  rate  7T,-y,-(i);  an  alternative 
interpretation  is  that  Trt/at  virions  are  produced  by  a  lytic  burst  during  the  death  of  an 
infected  cell.  Notice  that  the  quantities  pJt-,  qi3  and  7rj  can  capture  a  wide  range  of  drug-virus 
interactions,  including  both  synergistic  and  antagonistic  effects  of  combination  therapy,  as 
observed  by  St.  Clair  et  a/.  (1991)  and  Larder  ef  a/.  (1995),  and  surveyed  by  Wilson  and 
Hirsch  (1995). 

Uninfected  CD4+  cells  increase  because  of  (exponential)  proliferation  in  peripheral 
tissue  compartments  (e.g.,  secondary  lymphoid  organs)  and/or  (linear)  production  from  a 
pool  of  precursors  (e.g.,  the  thymus).  For  simplicity,  we  assume  a  constant  production  rate 
A.  In  addition,  we  let  [t,  at  and  kt  denote  the  respective  natural  death  rates  of  uninfected 
CD4+  cells,  cells  infected  by  strain  i  and  free  virus  of  strain  i. 

Because  the  primary  focus  of  this  paper  is  on  therapeutic  regimens  and  not  on  natural 
disease  progression,  we  purposely  do  not  incorporate  the  human  immune  response  into  the 
model.  Hence,  the  mutants  are  assumed  to  escape  from  the  drugs,  not  from  the  immune 
response.  The  model  also  ignores  latently  infected  cells;  although  most  plasma  virus  comes 
from  actively  infected  cells  (Ho  et  ai,  1995;  Wei  et  a/.),  this  does  not  imply  that  latently 
infected  cells  are  unimportant  for  the  emergence  of  drug  resistance.  Finally,  our  model  does 
not  incorporate  new  data  from  Shaw  (1995).  which  suggests  that  the  lymph  system  is  the 
location  of  considerable  production  of  plasma  virus  and  many  new  infections. 


We  impose  the  toxicity  and  nonnegativity  constraints 

£^(0<i.  (4) 

3  =  1 

dj(t)  >  0.  (5) 

where  t3  is  a  (normalized)  measure  of  the  toxicity  per  unit  time  of  drug  combination  j. 
Hence,  we  assume  that  toxicity  is  additive  across  drugs,  and  that  the  toxicity  threshold  is 
constant  over  time  (and  is  taken  to  be  unity,  without  loss  of  generality).  However,  because  the 
policies  that  emerge  from  our  analysis  typically  use  only  one  therapeutic  option  at  each  point 
in  time,  the  additivity  assumption  in  (4)  is  innocuous.  Moreover,  the  dynamic  policy  can  be 
easily  adapted  to  the  more  realistic  setting  where  the  toxicity  threshold  is  an  exogenously 
specified  function  of  either  time  (e.g..  six  months  of  intense  therapy  is  alternated  with  six 
months  of  light  therapy)  or  the  severity  of  the  patient's  side  effects:  we  propose  that  the 
drug  combination  dictated  by  the  index  policy  in  the  next  section  be  employed  at  each  point 
in  time,  but  the  dosage  should  be  altered  accordingly. 

Let  T  be  the  time  horizon.   Then  the  mathematical  control  problem  is  to  choose  the 
controls  {dj(t),t  >  0}  to  minimize 


L 


T5>(*)d  (6) 


0  t=l 


subject  to  (l)-(5). 

Our  control  problem  assumes  that  the  decision  maker,  in  choosing  dj(t)  at  time  t,  can 
observe  the  current  state  (x(t),yl(t).vl{t)).  Although  recent  technology  makes  it  possible 
to  quantify  virus  load,  CD4+  cell  counts  and  virus-infected  cells  in  the  blood  of  infected 
individuals,  these  techniques  are  not  currently  available  for  day-to-day  treatment  of  HIV- 
infected  patients  at  large.  Moreover,  quantification  of  virus  load  and  infected  cells  in  the 
lymph  system  is  even  more  complicated.  A  more  realistic  control  problem  for  current  day-to- 
day treatment  would  allow  the  controller  to  see  only  a  partial  observation  of  the  state  (e.g., 


only  the  total  CD4+  count  x(t)  +  YlUi  Vi(t)  an^  the  total  vir^  l°a<i  Hi=i  l';(0)-  The  control 
problem  (l)-(6)  then  becomes  a  nonlinear  filtering  problem;  although  partial  observations 
greatly  complicate  the  problem,  reasonably  good  policies  (e.g.,  the  linearized  Kalman  filter 
on  page  161  of  Elliott,  Aggoun  and  Morse,  1994)  can  be  derived.  However,  because  we 
are  primarily  interested  in  the  potential  value  of  employing  dynamic  multidrug  therapies, 
we  do  not  pursue  this  line  of  inquiry  and  hereafter  assume  that  the  controller  can  observe 
(x(t),yi(t),Vi(t)}  at  time  t. 

3.  Analysis 

The  control  problem  (l)-(6)  does  not  appear  to  admit  a  closed  form  solution.  More- 
over, standard  numerical  techniques  based  on  Pontryagin's  maximum  principle  (Pontrvagin 
ct  ai.  1962)  cannot  solve  a  problem  of  realistic  size  (e.g.,  five  drug  options  and  30  virus 
strains).  Hence,  we  resort  to  an  approximate  method,  which  employs  perturbation  methods 
in  conjunction  with  ideas  from  dynamic  programming. 

3.1.  Overview.  Let  V(x.yt.vt,t)  denote  the  cost  incurred  (as  given  in  (6))  from 
time  t  to  time  T  under  the  optimal  policy,  given  that  the  state  of  the  system  at  time  t  is 
(x,yi,Vi).  For  ease  of  notation,  we  suppress  the  dependence  of  V  on  its  arguments.  Then 
the  dynamic  programming  optimality  equation  is  (Bellman.  1957) 

dv      '        f^dv,         „      .  .  ,\ 


.     f  dV 

+  mm  <  -7— 
{^GfJ}l  d.r 


j 


1=1  J=l 

J 

(<) 

fc=l  j=\ 

where  Q  denotes  the  set  of  admissible  controls  that  satisfy  (4)-(5).  Our  basic  approach  is  as 

follows:  we  use  asymptotic  methods  to  obtain  closed  form  approximate  expressions  for  the 

(2I+l)-dimensional  process  {(x(t),yi(t),Vi(t)}  under  a  static  control  policy  (i.e.,  dj(t)  =  d3 
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for  all  t).  From  these  expressions,  we  derive  a.  closed  form  expression  for  V  which  has  the 
same  interpretation  as  the  V  in  (7),  except  that  the  (approximate)  cost  is  under  the  static 
policy,  not  the  optimal  policy.  Then  one  iteration  of  the  policy  improvement  algorithm 
(Howard.  1960)  is  performed:  we  differentiate  our  approximate  closed  form  expression  for  V 
with  respect  to  x  and  yt.  substitute  these  derivatives  into  the  right  side  of  (7)  and  perform 
the  embedded  minimization  in  this  equation. 

This  approach  yields  a  closed  form  dynamic  drug  control  policy:  it  specifies  how  much 
of  each  drug  combination  to  use  at  each  point  in  time  in  terms  of  the  current  state,  the 
current  time,  and  all  the  problem  parameters.  If  our  closed  form  expression  for  V  was 
exact,  then  dynamic  programming  theory  would  imply  that  our  proposed  policy  was  better 
(yields  a  lower  cost  in  (6))  than  the  static  policy  (in  fact,  if  our  expression  for  I'  was 
exact  then  repeated  policy  improvement  iterations  would  generate  a  sequence  of  policies 
that  converges  to  the  optimal  policy).  However,  our  expression  is  not  exact,  and  we  cannot 
draw  this  conclusion.  Nevertheless,  this  philosophy  (finding  an  approximate  value  for  V 
and  performing  one  iteration  of  the  policy  improvement  algorithm)  has  been  used  with 
considerable  success  in  designing  dynamic  call  acceptance/rejection  protocols  for  stochastic 
models  of  telephone  traffic  (e.g.,  Ott  and  Krishnan,  1985;  Key.  1990). 

3.2.  The  Optimal  Static  Policy.  Because  our  proposed  dynamic  policy  takes  a 
static  policy  as  its  starting  point,  it  is  natural  to  employ  the  optimal  static  policy.  The  static 

optimization  problem  is  given  by  the  following  nonlinear  program: 

r 
minimize,..,,,.,,,.^     ^  v,  (8) 

!  =  1 


bject    to     A  -(fi  +  Yl  &«i[l  "  EM;])-r  =  °-  (9) 

i=i  j=i 

(E  q*hvk[l  -J^Pjkdifjx  -  aiyi  =  0.  (10) 

fc=i  j=i 

TT.-t/i  -  [ki  +  0iX]Vi  =  0,  (11) 


£^<1,  (12) 

dj>0.  (13) 

We  have  been  unable  to  find  a  closed  form  solution  to  (8)-(13);  however,  standard  numerical 
techniques  (e.g.,  Avriel.  1976)  can  be  used  to  solve  this  problem.  If  the  additivity  con- 
straint (12)  is  not  realistic  then  one  can  change  constraint  (13)  to  d3  =  0  or  tJ  for  all  j; 
this  new  problem  can  be  solved  by  direct  enumeration  of  all  J  +  1  feasible  solutions. 

Let  d*  denote  the  optimal  static  policy  that  solves  (8)-(  13).  Under  this  policy,  the 
infectivity  of  virus  i  is  /^  J,,  where  dt  =  1  —  J2j=i  Pji^j- 

3.3.  Asymptotic  Analysis.  In  this  subsection  we  find  a  closed  form  approximate 
expression  for  the  system  trajectory  under  the  optimal  static  policy.  Consider  equations  ( 1  )- 
(3).  with  J,  taking  the  place  of  1  — 5l/=i  Pjidj(t).  Keeping  in  mind  that  we  will  use  the  solul  ion 
to  these  equations  to  derive  V.  let  us  consider  the  initial  conditions  (in  this  subsection,  s  is 
a  generic  time  index  and  t  denotes  a  specific  point  in  time) 

x(t)  =  x,        yi{t)  =  yi      Vi{t)  =  Vi.  (14) 

To  perform  the  perturbation  analysis,  we  introduce  the  parameter  e.  Without  loss  of 
generality,  let  us  assume  that  @\  =  rnin{i<,<j}  j3t.  Then  we  let  e  =  /?i  and  define 

&,  =  -■  (15) 

f 

Although  0!  =  1,  we  retain  this  parameter  in  the  model. 

The  asymptotic  analysis  is  based  on  the  assumption  that  e  is  small  in  value  (i.e.,  much 
less  than  one).  Typical  values  in  the  literature  for  the  infectivity  parameter  $  are  roughly 
10-5;  hence,  the  approximation  should  perform  well. 

We  assume  that  our  solution  is  of  the  form: 

x(s)  =  XM(s)  +  ex^(s).       y>(s)  =  y\0)(s)  +  ey{tl\s).       Vi(s)  -  v\0)(s)  +  er'1'^).       (16) 
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Although  we  could  define  and  solve  for  higher  order  terms,  just  deriving  the  first  order  terms 
is  quite  cumbersome,  at  least  without  a  computer.  We  use  (15)  to  replace  (3t  by  e/?t  and 
use  (16)  to  substitute  in  for  the  system  state  in  equations  (l)-(3)  and  (14).  For  i  =  0,1. 
we  collect  terms  of  order  e'  to  solve  for  the  unknown  processes  on  the  right  side  of  (16). 
Collecting  the  constant  (i.e.,  e°)  terms  yields  the  following  system  of  differential  equations: 

x{0)(s)  =  \-iix{0]{s).  (17) 

y\0)(s)  =  -*iy\0)(s),  (is) 

ii°\s)  =  nyl0)(3)  -  kiV^is),  (19) 

subject  to  the  initial  conditions 

,-<°>(0  =  .r.         y\°\t)  =  yt       $\t)  =  »,  (20) 

It  is  worth  noting  that  the  mathematically  perturbed  system  (17)-(20)  corresponds  precisely 
to  the  physical  perturbation  performed  when  giving  potent  RT  inhibitors.  In  Section  4,  we 
use  data  from  Wei  et  ai.,  Ho  et  al.  and  Perelson  et  al.  (1995),  who  used  various  RT  inhibitors 
and  protease  inhibitors  to  perturb  the  system,  to  estimate  values  for  the  model  parameters. 
The  solution  to  equations  (17)-(20)  is 

z(o)(3)  =  A  +  (x  _  A)e-M(.-*))  (21) 

V?\s)  =  Vi*-ai{~t\  (22) 

„<*»(,)  =  JEiW_e-a,(.^)  +  {v.  _  J™_)e-M*-*).  (23) 

kt  —  a,  kt  —  a; 

The  order  e  terms  lead  to  the  following  system  of  differential  equations: 

I 


x(%)  =  -MxW(a)-x(°>(a)j:M»}0)(-)i  (24) 

»}i)(-)=x(o,(*)(x:««A^f,(*))-a*»i1)(*).  (25) 
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v\'\s)  =  riyl%)  -  kivPia)  -  i3tx^(s)vl0)(s).  (26) 

subject  to  the  initial  conditions 

.r<1)(0  =  0,        yV(t)  =  0      u{1)(i)  =  0.  (27) 

Now  we  substitute  the  solution  (21)-(23)  into  (24)-(26)  and  solve  equations  (24)-(27)  sequen- 
tially. The  solution  is  given  in  Appendix  A. 

3.4.  The  Proposed  Dynamic  Policy.  With  the  solution  to  the  asymptotic  sys- 
tems (17)-(20)  and  (24)-(27)  in  hand,  we  can  estimate  the  value  function  V  by  Jt  5Zf_1(f,-  {s)  + 
ev\  (s))ds.  This  integration  is  carried  out  in  Appendix  B.  The  proposed  solution  is  greatly 
simplified  if  we  assume  that  the  value  function  is  independent  of  the  time  horizon.  In  prac- 
tice, the  time  horizon  T  is  very  large  (i.e..  in  years)  in  relation  to  the  time  scale  of  the 
systems  dynamics,  which  change  on  a  daily  basis  (Ho  ef  al..  Wei  et  a/.):  hence,  this  is  a 
natural  assumption.  Consequently,  we  set  T  =  oc  and  /  =  0  in  (46)  to  obtain  a  value 
function  (and  hence  a  policy)  that  is  independent  of  the  time  horizon;  this  approximate, 
stationary  value  function  is  denoted  by  Vc.  to  distinguish  it  from  the  value  function  in  (7). 
Differentiating  (46)  with  respect  to  x  and  yt  yields  (with  T  =  oo  and  t  =  0) 

WctUt^-l)!*)^^),  (28) 


+  e 


Wij3.  /_^_  +         x  ~  A//'        \  (j.  y-  Mil  -  - 
'  \otikiH       (ai  +  fi)(ki  +  fi)J  I    '  J=l  otjkj       Ja 


(29) 


%  Ctiki 

Now  we  can  substitute  these  derivatives  into  (7)  and  perform  the  minimization.  Be- 
cause the  function  to  be  minimized  is  linear  in  our  controls  dj(t),  the  solution  d*(t)  is  one  of 
the  (at  most  J  +  1)  extreme  points  of  the  constraint  set  (4)-(5),  and  can  be  found  in  closed 
form.  Let  us  define  the  dynamic  quantities 

I 


Ci{t)  =  0Mt) 
10 


7=1 


Cilk'i 


/ Ki quel,  _  J_      y.  *jqijdi\  /7T/[^A(Qf  +  A-;  +  // )  +  Qf  Ay  (?,/.;■  (Q  -  yf(0]  _    fv(Q  \ 
^  ";A,        fc,      ^    ajkj   '\  ctiki(cu  +  fi)(kt  +  (i)  k  +  (i' 

Then  the  proposed  policy  is:  apply  no  therapy  (i.e.,  d^(t)  =  0  for  j  =  1 J )  if 


(30) 


max    <  0:  (■31) 

{l<j<J}  Tj 

otherwise,  use  drug  combination  j'  (i.e..  d".(t)  =  t~,  ,  d~(t)  =  0  for  j  ^  j"),  where 

eLiPj.-c»(0  /QOx 

j    =  argmax^^^jj .  (32) 

We  conclude  this  subsection  with  several  remarks. 

The  proposed  therapeutic  strategy  is  a  dynamic  index  policy,  where  drug  combination 
j  has  the  dynamic  index  Ht=]  PjiCi(t)/Tj,  and  at  each  point  in  time  the  policy  uses  the  drug 
combination  that  possesses  the  largest  index.  The  quantity  ct(t)  essentially  represents  the 
marginal  increase  in  the  total  future  viral  load  if  we  let  one  more  CI)4+  cell  get  infected  by 
virus  strain  i  at  time  /.  The  index  for  drug  combination  j  is  computed  by  weighting  the 
dynamic  marginal  cost  for  each  virus  by  the  efficiency  of  drug  j  for  that  virus,  summing  up 
over  the  virus  strains,  and  dividing  by  the  drug  combination's  toxicity.  Hence,  our  dynamic 
policy  uses  information  on  the  effectiveness  of  each  drug  against  each  virus,  the  current 
potential  cost  (in  terms  of  total  future  viral  load)  of  a  new  infection  by  each  virus  strain, 
and  the  toxicity  of  each  drug  combination,  and  summarizes  this  information  in  a  succinct 
manner.  .  An  important  advantage  of  index  policies  is  their  ease  of  use:  the  complexity  of 
the  derivation  and  implementation  of  an  index  policy  is  independent  of  the  problem  size; 
hence,  policy  (30)-(32)  can  easily  be  derived  for  a  problem  with  20  drug  combinations  and 
150  virus  strains.  Although  this  policy  is  not  the  optimal  solution  to  problem  (l)-(6).  the 
optimal  solution  to  dynamic  resource  allocation  problems  is  often  characterized  by  index 
policies  (e.g..  Gittins.  1989). 

Notice  that  if  we  took  e  =  0  in  (30)  then  the  policy  would  be  independent  of  x  and  ?/,-, 
whereas  the  proposed  policy  depends  on  the  entire  (2/  -+-  l)-dimensional  system  state;  this 
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suggests  that  incorporating  only  the  e°  terms  leads  to  a  rather  crude  policy.  Also,  one  of  the 
drug  combinations  would  always  be  administered  (i.e.,  inequality  (31)  never  holds)  if  e  =  0. 
Inquality  (31)  was  never  satisfied  in  any  of  our  numerical  runs:  the  proposed  dynamic  policy 
always  used  one  of  the  drug  combinations. 

It  is  possible  to  implement  further  iterations  of  the  policy  improvement  algortihm.  Let 
us  denote  the  proposed  policy  in  (30)-(32)  by  eQ  (t);  although  this  control  is  state-dependent, 
it  can  be  expressed  solely  as  a  function  of  time  because  the  system  is  deterministic.  If  we 
define  dt  (t)  =  1  —  J2j=iPjidj  \t)  then  this  quantity  can  be  used  as  our  starting  policy 
for  the  next  policy  improvement  iteration.  Turning  to  the  asymptotic  analysis,  we  observe 
that  the  e°-order  system  (17)-(20)  and  its  solution  (21)-(23)  are  independent  of  the  control. 
If  we  substitute  d]  (t)  in  for  dt  in  equations  (24)-(25).  then  the  eorder  system  (24)-(27) 
is  still  a  set  of  linear  differential  equations  thai  can  be  easily  solved  numerically  using  the 
matrix  exponential  (Golub  and  Van  Loan.  1989).  Then  we  can  carry  out  the  calculations  in 
Appendix  B  and  equations  (28)-(29)  on  a  computer,  and  perform  the  minimization  in  (7) 
to  get  a  new  policy  d }  (t).  Of  course,  higher  order  terms  in  the  asymptoic  expansion  can 
also  be  performed  relatively  easily  with  a  computer,  and  it  is  conceivable  that,  with  enough 
higher  order  terms  and  enough  policy  iterations  (typically,  only  a  handful  of  the  latter  is 
required  to  get  close  to  optimality).  such  a  procedure  would  generate  a  policy  that  is  very 
close  to  optimal.  However,  because  the  proposed  policy  in  (30)-(32)  performed  well  in  our 
numerical  study,  we  have  not  pursued  this  computational  approach. 

3.5.  A  Symmetric  Case.  To  gain  a  better  understanding  of  the  proposed  policy, 
we  focus  on  the  symmetric  case  where  I  =  J,  a,  =  a,  kt  =  k,  7rt  =  7r,  /?,•  =  /?,  Tj  =  1,  pn  =  p, 
Ptj  —  p  for  i  ^  j  (where  p  >  p),  qn  —  q  and  q,j  =  (I  —  q)/(I  —  1)  for  i  /  j.  Hence,  each  virus 
strain  has  the  same  parameter  values,  and  for  i  =  1, . . . ,  /,  drug  combination  i  is  targeted  at 
strain  i.  Also,  let  y  =  Y.\=\  Vi  and  v  =  £)f=1  vt  denote  the  total  number  of  infected  cells  and 
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the  total  free  virus,  respectively.  Symmetry  arguments  imply  that  one  of  the  two  symmetric 
static  extreme  point  policies  is  optimal  (although  perhaps  not  uniquely  so):  dj(t)  =  0  for  all 
j  and  t,  or  d3{t)  =  1/7  for  all  j  and  t.  Then  dt  =  1  —  p  for  all  i.  where  p  is  the  "average 
efficacy"  of  the  static  drug  policy,  which  is  zero  for  the  first  policy  and  [p  +  {I  —  1  )p]/I  for 
the  second  policy.  The  best  static  policy  (i.e.,  the  one  minimizing  (8))  is  dj(t)  =  0  if  0  <  0 
and  is  dj(t)  =  1/7  if  0  >  0.  where 


#  = 


//(A--  q; 


airy  iTii        k  —  oq 

— +  (<--rJf-)(— r- 

a  A-  —  a  A' 


A 

x-- 
/' 


+  (v--, )- 


(q  +  /.i)(k  —  a)  k  -  a    k  +  fi. 

(33) 


The  solution  in  the  symmetric  case  reduces  to:  If  7r(  1  —  p)  >  a  then  apply  no  drugs  if 

r-(.r-y)>— — -  + ;  (34) 

~  p[?r(  1  —  p)  —  a\  ak 

if  7r(  1  —  p)  <  a  then  apply  no  treatment  if 

Q  +  /'         ,  (a  +  //)(fc  +  /x)       A(q  +  A-  +  /0 

r-(r-«/)  <  ^7— r  + •  (35) 

~  /3|_3r  (_  1  —  p)  —  aj  qA' 

Otherwise  administer  drug  combination  /*,  where  v~(t)  >  Uj(f)  for  ?  =  1. ....  7. 

Hence,  the  proposed  policy  in  the  symmetric  case  always  applies  the  drug  combination 

that  corresponds  to  the  most  prevalent  virus.    This  therapeutic  strategy  certainly  seems 

reasonable,  although  not  obvious  nor  necessarily  optimal.   Also,  the  drug/no  drug  decision 

in  (34)-(35)  depends  only  on  the  relative  value  of  the  amount  of  free  virus  v  and  the  difference 

between  the  number  of  uninfected  and  infected  cells.  Hence,  the  "switching  curve"  (between 

the  drug/no  drug  decision)  in  three  dimensions  (i.e..  x,  y  and  v)  can  actually  be  expressed 

as  a  straight  line  in  two  dimensions. 

4.  Alternative  Therapies 

The  tedious  part  of  the  analysis  in  Section  3  is  the  perturbation  analysis  that  leads  to 
the  derivatives  of  the  value  function  for  a  generic  static  policy.   Now  that  these  derivatives 
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have  been  estimated,  it  is  a  relatively  simple  matter  to  consider  other  types  of  therapies. 
Here  are  two  examples;  much  of  the  previous  notation  is  reused. 

Protease  Inhibitors.    Protease  inhibitors  render  newly  produced  virions  non-infectious. 
Suppose  we  have  J  combinations  of  protease  inhibitors,  and  the  controller  must  decide  how 
much  of  each  to  use  subject  to  II,=i  Tjdj(t)  <  1-    The  drugs'  effectiveness  is  given  by  the 
matrix  pJn  and  the  resulting  virus  replication  rate  is  7rt[l  —  Y?i=\  Pjidj{t)].  An  analysis  similar 
to  that  in  Sectic;.  3  yields  the  dynamic  index 

E'"'**^*"  (36) 

for  drug  combination  j.  At  each  point  in  time,  the  proposed  therapeutic  strategy  uses 
the  combination  with  the  largest  index  if  this  index  is  positive:  otherwise,  no  drugs  are 
administered.  Note  that  'i^f-  should  be  positive,  so  therapy  should  always  be  applied.  Define 
dt  =  1  —  J2j=i  Pjid*,  where  dj  solves  the  static  optimization  problem  that  is  analogous  to  the 
one  in  Section  3.2.  Integrating  the  approximate  value  function  in  Appendix  B  with  respect 
to  Vi  gives 

W.       1  tA        fegofe  -iW-A).  (37) 

dvi       ki      ki(ki  +  it)  [p    ajkj        h)  v 

and  substitution  of  this  quantity  into  (36)  yields  the  proposed  therapy  in  terms  of  the 
problem  parameters  and  the  current  state.  For  the  symmetric  case  considered  in  Section 
3.5,  the  solution  in  (36)-(37)  employs  the  protease  inhibitor  combination  that  corresponds 
to  the  largest  value  of  y,{t)  at  each  time  t;  that  is,  the  therapy  targets  the  virus  strain  that 
has  infected  the  most  cells. 

Reconstituting  the  Immune  System.  Consider  using  a  drug,  such  as  interleukin-2 
(IL-2),  that  reconstitutes  the  immune  system.  This  drug  affects  our  model  by  increasing  A, 
the  production  rate  of  uninfected  CD4+  cells.  Suppose  the  new  production  rate  is  A  +  A(i), 
where  our  control  A(r)  €  [0,A].   Then  the  optimization  problem  embedded  in  the  dynamic 
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programming  optimality  equation  is  simply  to  minimize  A«-.  The  proposed  policy  is 

\*(t)  =  \     if     ^<0.  (38) 

ox 

and  X*(t)  =  0  otherwise,  where  4^  is  given  in  equation  (28)  (with  dt  =   1).    Define  the 

constants 

*  =  £*?-!-•  (») 

If  ct  >  0  for  all  i  then  we  never  use  the  drug,  and  if  c,  <  0  for  all  i  then  we  always  use  the 
drug;  if  neither  of  these  cases  hold  then  a  dynamic  policy  is  optimal.  The  first  quantity  on 
the  right  side  of  (39)  is  the  expected  (with  respect  to  the  mutation  probabilities  qtJ)  value  of 
w/(ak).  Since  the  mutation  rates  are  very  small,  this  quantity  is  nearly  equal  to  ^/(q^A:,), 
and  if  we  ignore  mutations  then  the  drug  is  always  given  if  and  only  if  7r£-  >  a,  for  all  i.  If 
~,  >  at.  then  the  "basic  reproductive  ratio  I\\,"  for  the  infected  cells  is  greater  than  unity 
(that  is,  each  infected  cell  will  produce  more  than  one  free  virus  particle  during  its  lifetime), 
and  so  this  drug  effectively  "adds  more  fuel  to  the  fire".  Empirical  results  (see  Section  5) 
suggest  that  7rt  >  a,;  hence,  adding  uninfected  CD4+  cells  in  isolation  is  not  desirable;  of 
course,  our  model  has  not  incorporated  an  immune  response,  and  thus  may  be  omitting 
some  positive  side  effects  of  additional  CD4+  cells.  Although  we  do  not  do  so  here,  a  similar 
analysis  can  be  performed  for  a  therapy  that  reduces  CD4+  cell  production. 

Other  therapies  that  can  be  analyzed  include  certain  forms  of  immunotherapy  (which 
would  increase  the  death  rate  of  free  virus  and/or  the  death  rate  of  infected  cells),  ex  vivo 
expansion  of  CD4+  cells  (Wilson  et  al.,  1995).  which  would  simultaneously  increase  x  and 
decrease  y;,  and  dynamic  gene  therapy  (Nabel,  1994),  which  would  increase  i\  for  certain 
strains. 

Most  importantly,  we  can  also  consider  employing  some  of  these  therapies  simultane- 
ously, with  a  joint  toxicity  constraint.  For  example,  if  one  allowed  the  simidtaneous  use  of 
RT  inhibitors  and  production  of  CD4+  cells  (Schwartz  et  a/.,  1991),  it  may  turn  out  to  be 
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beneficial  to  introduce  CD4+  cells  at  times  when  the  infectivity  of  the  virus  and  the  viral 
load  are  sufficiently  suppressed. 

5.  An  Illustrative  Example 

In  this  section,  the  dynamic  model  is  simulated  under  the  proposed  policy,  as  well  as 
under  several  simpler  policies.  No  attempt  has  been  made  to  generate  a  model  of  realistic  size; 
rather,  we  consider  only  two  virus  strains  and  two  drug  combinations  in  order  to  illustrate 
the  nature  and  the  impact  of  dynamic  drug  treatment.  In  a  subsequent  study,  we  plan  to 
use  data  from  multidrug  clinical  trials  to  analyze  larger  models. 

5.1.  Parameter  Values.  The  parameter  values  for  our  model  are  displayed  in 
Table  1.  Most  of  these  values  were  sequentially  derived  from  existing  data  in  the  following 
manner.  Ho  et  al.  and  Wei  et  al.  estimate  a  CD4+  production  rale  of  roughly  1.8  x  109 
cells  per  day.  About  2%  of  the  total  CD4+  population  resides  in  the  peripheral  blood,  and 
a  human  body  contains  roughly  5  x  106  mm3.  Hence,  the  1.8  x  109  figure  is  comparable  to  7 
cells/mm3  per  day.  The  death  rate  //  was  chosen  to  be  0.007  per  day,  so  that  the  virus-free 
equilibrium  CD4+  count  is  A///  =  1000  cells  per  mm3,  which  corresponds  to  the  CD4+  count 
for  an  uninfected  individual. 

Now  we  turn  to  the  death  rates  a,  of  infected  cells  and  A*;  of  free  virus.  Ho  et  al.  and 
Wei  et  al.  estimate  the  death  rate  of  virus-producing  cells  to  be  about  0.35  per  day.  More 
recentby.  using  more  accurate  data.  Perelson  et  al.  (1995)  show  that  the  mean  death  rate  is 
about  0.49.  They  were  also  able  to  get  a  rough  estimate  of  3.07  for  the  death  rate  of  free 
virus. 

Some  of  the  remaining  parameters  are  derived  by  considering  the  quasi-steady  state 
conditions  before  drug  treatment.  For  typical  pre-treatment  values,  we  use  the  average  values 
(over  20  individuals)  in  Table  1  of  Ho  et  a/.;  the  average  pre-treatment  CD4+  count  was  180 
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cells  per  mm3  and  the  average  viral  load  was  134  virions  per  mm3.  Only  the  wild-type 
virus  was  present  in  nonnegligible  amounts  before  treatment.  Hence,  we  can  consider  only 
the  wild-type  virus,  and  set  the  left  side  of  equations  (l)-(3)  equal  to  zero  (reflecting  the 
quasi-steady  state)  to  obtain  a  set  of  four  equations  (equations  (l)-(3)  and  x  +  y  =  180) 
and  four  unknowns:  the  pre-treatment  number  of  uninfected  cells  x  and  infected  cells  y,  the 
infectivity  rate  J3  and  the  replication  rate  K.  Substituting  180  -  y  for  x  in  (1)  and  (2)  and 
solving  these  two  equations  for  J3  and  y  yields  J3  =  (aX  —  180a/i)/(lS0m>  -  \v)  =  2.58  x  10-4 
mm3  per  day  and  y  =  180/)i>/(a  +  fSv)  =  11.86  cells  per  mm3.  Hence,  x  =  168.14  cells 
per  mm3,  and  the  fraction  of  cells  that  are  infected  is  y/(x  +  y)  =  0.066,  which  is  in  close 
agreement  with  the  estimate  of  5%  found  in  Embretson  et  al.  (1993).  Finally,  solving  (3) 
for  7r  yields  k  =  v(k  +  0x)/y  —  35.18  virions  per  day,  implying  that  tt/o  =  71.8  virions  are 
produced  by  an  infected  cell  in  our  model.  Readers  should  keep  in  mind  that  most  virus 
is  produced  in  the  lymph  system,  whereas  our  estimates  for  k  and  $  are  based  on  plasma 
concentrations. 

We  use  the  mutation  rate  calculated  in  Mansky  and  Temin  (1995),  qu  =  921  =  3.4  x 
10"5.  We  also  assume  that  drug  combination  i  is  targeted  at  virus  strain  i.  More  specifically, 
we  let  pn  =  P22  =  0.95  and  p12  =  P21  =  0.05.  meaning  that  each  drug  combination  is  95% 
effective  at  blocking  infections  of  its  own  strain,  but  only  5%  effective  at  blocking  infections 
of  the  other  strain;  such  a  state  of  affairs  might  arise  if  virus  1  is  an  AZT-resistant  strain 
that  is  dominant  at  time  zero,  and  the  two  drug  combinations  correspond  to  two  other  (i.e., 
not  AZT)  RT  inhibitors.  The  toxicity  coefficients  are  set  to  tx  -  r2  =  1.  so  that  if  dj(t)  =  1 
then  the  amount  of  drug  combination  j  administered  corresponds  to  the  threshold  toxicity 
level. 

Notice  that,  until  now,  the  parameter  values  are  consistent  with  the  symmetric  case 
introduced  in  Section  3.5.    Now  we  introduce  asymmetry  by  letting  n2  =  0.9^!;  hence,  we 
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assume  that  virus  1  has  a  higher  replication  rate  than  virus  2.  The  pre-treatment  equilibrium 
state  was  taken  as  the  starting  point  of  our  simulation  runs  (see  Table  1). 


Variabl< 

3S 

Initial  Values 

X 

Density  of  Uninfected  CD4+  cells 

168.08  cells  mm-3 

V\ 

Density  of  cells  infected  by  strain  1 

11.88  cells  mm-3 

2/2 

Density  of  cells  infected  by  strain  2 

0.004  cells  mm-3 

Vl 

Density  of  free  virus  of  strain  1 

134.25  virions  mm-" 

v2 

Density  of  free  virus  of  strain  2 

0.04  virions  mm-" 

Parameters 

Values 

A 

Production  rate  of  uninfected  CD4+  cells 

7.0  cells  mm-3  day-1 

/' 

Death  rate  of  uninfected  CD4+  cells 

0.007  day"1 

0-1,02 

Death  rates  of  infected  CD4+  cells 

0.49  day"1 

*i,  k2 

Death  rates  of  free  virus 

3.07  day"1 

Lfc 

Infectivity  rates  of  free  virus 

2.58  x  10-4  mnr3  day-1 

-\ 

Replication  rate  of  strain  1 

35.18  virions  day-1 

7T2 

Replication  rate  of  strain  2 

31.66  virions  day-1 

912,921 

Mutation  rates  (probabilities) 

3.1  x  10-5 

Pll,P22 

Drug  efficacies  on  their  own  strain 

0.95 

Pl2,P21 

Drug  efficacies  on  the  other  strain 

0.05 

T\,T2 

Toxicity  parameters  (dimensionless) 

1.0 

Table  1.  Parameter  values  for  the  model. 

5.2.  Simulation  Results.  For  the  set  of  parameter  values  specified  in  Subsection  5.1,  we 
computed  the  proposed  dynamic  multidrug  therapy  given  by  (30)- (32),  and  then  determined 
the  system  behavior  under  this  regimen  by  simulating  equations  (l)-(3)  up  to  time  T  equals 
six  months  (the  system  had  nearly  reached  its  steady  state  by  this  time).  The  simulation 
results  were  generated  by  the  automatic  step  size  Runge-Kutta-Fehlberg  method,  with  a 
fourth  and  fifth  order  pair  for  higher  accuracy.  Our  results  are  shown  in  Figure  1.  The 
horizontal  axis  in  all  five  graphs  is  time,  and  the  uninfected  CD4+  count  x(t)  is  given  in 
Figure  la,  the  viral  loads  vt(t)  are  pictured  in  Figures  lb  and  lc,  the  total  viral  load  is  in 
Figure  Id  and  the  proposed  dynamic  therapy  is  displayed  in  Figure  le.  Figure    2  displays 
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Figure  1.  System  behavior  under  the  dynamic  policy. 
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Figure  2.  System  behavior  under  the  continuous  treatment  policy. 
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the  same  quantities  under  the  continuous  treatment  policy  that  continually  applies  drug  1 
(i.e..  di(t)  =  l.d2(t)  =  0  for  all  /).  Under  both  policies,  the  shape  of  the  v,  and  y,  curves 
were  very  similar,  with  the  free  virus  vt  lagging  behind  the  infected  cell  count  y,  by  several 
days;  hence,  the  dynamics  of  the  infected  cells  do  not  appear  in  Figures  1  and  2.  We  also 
simulated  the  model  under  the  optimal  static  policy  derived  in  Section  3.2;  the  solution  was 
d{  —  0.5-3,  d2  =  0.47.  Although  we  do  not  include  figures  for  the  optimal  static  policy,  the 
average  (over  one  year)  uninfected  CD4+  count  and  average  total  viral  load  for  the  various 
policies  are  reported  in  Table  2.  Readers  should  note  that  the  optimal  static  policy  was 
derived  under  the  assumption  that  drug  toxicities  are  additive;  if  this  assumption  is  not 
valid  then  the  static  policy  is  not  a  feasible  alternative. 


Average  Total 

Average  Uninfected 

Viral  Load 

CD4+  Count 

Policy 

(virions  per  mm3) 

(cells  per  mm3) 

No  Drugs 

131.29 

168.08 

Continuous  Treatment 

110.99 

221.11 

Optimal  Static 

71.07 

365.25 

Dynamic 

62.60 

380.24 

Table  2.  Summarv  of  numerical  results. 


6.   Discussion 


It  should  be  stressed  that  we  are  considering  an  illustrative  example:  the  numerical  values 
in  this  section  are  merely  the  result  of  a  particular  computer  simulation,  with  a  particular 
choice  of  parameters. 

Dynamic  Policy.  At  time  zero,  the  system  is  in  its  drug-free  equilibrium,  and  virus 
strain  1  is  dominant.  As  expected,  the  viral  load  in  Figure  1  drops  to  about  \%  of  its  pre- 
treatment  value  within  ten  days,  and  it  stays  there  until  the  beginning  of  the  fourth  month. 
There  is  a  concomitant  increase,  which  is  roughly  linear,  in  the  number  of  uninfected  CD4  + 
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cells,  and  this  quantity  peaks  at  three  months;  the  peak  corresponds  to  more  than  a  three- 
fold increase  over  the  pre-treatment  value.  Not  surprisingly,  the  dynamic  policy  initiates 
treatment  with  drug  1.  The  policy  first  uses  drug  2  on  day  18,  and  irregularly  (e.g.,  it  stays 
with  drug  1  for  over  a  week  at  the  beginning  of  the  second  month)  switches  back  and  forth 
between  the  two  drugs  about  once  per  day  until  the  end  of  the  third  month.  During  the  fourth 
month,  the  two  virus  strains  feed  on  the  large  pool  of  uninfected  cells  and  simultaneously 
emerge,  peaking  with  a  total  viral  load  that  is  roughly  four  times  the  pre-treatment  level. 
The  drug  treatment  oscillates  rapidly  between  the  two  options  during  this  month,  in  an 
attempt  to  simultaneously  control  both  viruses.  The  less  fit  virus  2  constitutes  about  55% 
of  the  free  virus  during  the  fourth  and  fifth  months,  and  hence  the  dynamic  policy  exerts 
the  majority  of  its  effort  in  controlling  the  more  fit  virus  1. 

The  large  viral  load  in  turn  leads  to  more  cell  infections,  and  the  uninfected  CD4+  count 
decreases  in  the  fourth  month,  bottoming  out  at  a  level  that  is  higher  than  the  pre-treatment 
uninfected  CD4+  count.  This  reduction  in  uninfected  cells  allows  the  drugs  to  bring  both 
viruses  back  under  control  during  the  fifth  month.  After  this  time,  these  oscillations  dampen 
(e.g.,  the  total  viral  load  peaks  again  at  the  end  of  the  seventh  month  with  a  value  of  about 
240)  and  a  new  steady  state  is  slowly  reached.  Over  the  first  six  months,  the  dynamic  policy 
allocated  58.7%  of  the  time  to  drug  1  and  41.3%  of  the  time  to  drug  2:  hence,  the  dynamic 
policy  expended  more  effort  than  the  optimal  static  policy  on  the  more  fit  virus. 

Continuous  Treatment.  Under  the  continuous  treatment  policy  pictured  in  Figure  2, 
we  again  see  a  rapid  drop  in  virus  1  and  a  linear  rise  in  the  CD4+  count;  however,  in  contrast 
to  the  dynamic  policy,  the  continuous  treatment  policy  suppresses  virus  1  throughout  the 
first  six  months.  Virus  2  emerges  during  the  second  month,  and  reaches  a  very  high  level  by 
the  end  of  the  month.  This  high  viral  load  then  drops  the  uninfected  CD4+  count  below  its 
pre-treatment  level,  which  in  turn  leads  to  a  reduction  of  free  virus  2  in  the  third  month. 
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Comparison  of  Policies.  Table  2  shows  that  the  dynamic  policy  performs  much  better 
than  the  continuous  treatment  policy:  the  continuous  policy  achieves  a  17.4%  reduction 
in  average  (over  the  first  six  months)  viral  load  and  a  31.6%  increase  in  uninfected  CD4+ 
count  with  respect  to  the  drug-free  equilibrium,  whereas  the  dynamic  policy  achieves  a 
53.4%  reduction  in  viral  load  and  126.2%  increase  in  uninfected  CD4+  count.  Moreover,  the 
viral  peaks  and  CD4+  valleys  are  less  pronounced  under  the  dynamic  policy  than  under  the 
continuous  treatment  policy.  Finally,  the  dynamic  policy,  by  frequently  switching  between 
the  two  drug  options  during  the  early  months,  delays  the  emergence  of  virus  2  from  the 
second  month  until  the  fourth  month.  Although  the  results  are  not  reported  here,  we  also 
tested  an  alternating  policy  that  uses  drug  1  for  the  first  three  months  and  then  uses  drug 
2  for  the  last  three  months.  Not  surprisingly,  this  policy  did  not  perform  well. 

We  should  note  that  the  post-treatment  steady  state  values  of  uninfected  CD4+  cells 
and  total  virus  load  under  all  of  the  therapeutic  strategies  are  not  appreciably  different 
than  the  pre-treatment  steady  state  values;  hence,  the  therapeutic  benefits  in  our  model 
are  achieved  during  the  transient  domain.  However,  the  viral  dynamics  of  HIV-infected 
individuals  undergoing  dynamic  drug  treatment  would  probably  exhibit  transient  behavior 
for  many  years.  Therefore,  the  improvements  over  the  transient  domain  in  our  model  should 
be  indicative  of  the  improvements  that  can  be  realized  in  clinical  practice. 

Unequal  Infectivity  Hates.  Because  there  seems  to  be  some  uncertainty  about  whether 
virus  strains  have  different  infectivity  rates,  we  reduced  the  infectivity  of  virus  2  so  that 
02  =  0.9/?i-  The  qualitative  results  were  similar  to  our  base  case,  although  the  dynamic 
policy  expended  a  slightly  larger  fraction  of  its  effort  on  virus  1,  resulting  in  a  somewhat 
larger  proportion  of  virus  2  in  the  viral  mix.  More  generally,  numerous  other  simulation 
runs  were  generated  by  varying  the  parameters  7r£-,  $,  A  and  p(J,  and  the  qualitative  results 
displayed  in  Figures  1  and  2  remained  intact;  hence,  the  model  appears  to  be  robust. 
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7.  Conclusions 

We  have  used  the  control  theory  paradigm  in  a  HIV  therapeutic  setting.  Our  model 
incorporates  different  virus  strains,  and  a  variety  of  therapeutic  options  are  available.  The 
approximation  method,  which  uses  perturbation  analysis  and  the  policy  improvement  al- 
gorithm, gives  rise  to  a  dynamic  index  policy:  each  drug  combination  has  an  associated 
dynamic  index,  and  at  each  point  in  time  the  policy  uses  the  drug  combination  with  the 
largest  index.  The  dynamic  indices  succinctly  summarize  three  quantities:  the  efficacy  of 
each  drug  combination  on  each  virus  strain,  the  toxicity  of  each  drug  combination,  and  the 
marginal  benefit  of  blocking  a  new  cell  infection  by  each  virus  strain;  the  last  of  these  three 
quantities  changes  over  time  as  a  function  of  an  individual's  CD4+  count,  viral  load  and 
viral  mix. 

Numerical  results  for  a  two- virus,  two-drug  model  suggest  that  dynamic  multidrug  ther- 
apies outperform  their  static  counterparts:  the  total  viral  load  is  reduced,  the  uninfected 
CD4+  count  is  increased,  and  the  emergence  of  drug  resistant  strains  is  delayed.  Although 
the  individualized  therapy  that  we  propose  is  no  doubt  more  difficult  to  implement  than 
protocols  that  are  currently  in  practice,  the  benefits  may  outweigh  the  costs  of  implemen- 
tation. These  benefits  are  achieved  by  frequently  changing  therapies  over  time  in  response 
to  and  in  anticipation  of  the  emergence  of  drug-resistant  strains.  In  addition,  the  dynamic 
policy  attempts  to  maintain  the  more  fit  virus  strains  at  a  relatively  low  level,  while  perhaps 
allowing  the  less  fit  strains  to  partially  establish  themselves. 

Although  our  numerical  results  focus  on  the  two- virus,  two-drug  case,  the  development 
of  more  realistic  models  using  data  from  multidrug  studies  is  planned  for  the  future.  It  may 
turn  out  that  the  best  way  to  delay  the  onset  of  AIDS  is  via  the  intelligent  use  of  a  wide 
range  of  therapies  (RT  inhibitors,  protease  inhibitors,  reconstitution  of  the  immune  system, 
immunotherapy,  gene  therapy,  etc.).    The  model  and  analysis  presented  here  provides  the 
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framework  for  the  development  of  such  therapeutic  strategies. 

Finally,  our  approach  to  this  problem  circumvents  the  usual  obstacles  inherent  in  solv- 
ing high-dimensional  nonlinear  control  problems.  This  method,  which  appears  to  be  new, 
has  potential  applications  in  a  wide  variety  of  control  problems  in  epidemiology  and  ecology: 
besides  allowing  for  mutation  among  multiple  variants  of  entities  (in  this  case,  viruses),  the 
approach  can  also  incorporate  discrete  age  classes  (e.g.,  for  optimal  vaccinations  of  measles) 
and  discrete  spatial  (e.g..  lattice)  structures  (e.g.,  for  dynamic  control  of  spatial  epidemics). 
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APPENDIX  A 

The  solution  to  (24)- (27)  is 
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Xiqjiftdjix—) — ^ +  (v3  -  TjMl-h : m 7 

(42) 
where  the  constants  appearing  in  these  equations  are 

c,=i(-T^-.L --**-)-%*  --i(^ +  .-.))■       («) 

^y,  =  -QiiPidi-  (  7 —  +  {Vi- 


[t    VA'j  —  Q,  A,  —  Q;     q,  —  Av 
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+9iiftdI-  j--  -)(-7] --('-,-  i ) ; -) 


A 


*iVi 


(aj  -  %)(«<  -  a,)  kj-aj   (ai-kj 


+  qJl3JdJ{x 


*iVj 


fi    \{kj  -a}){Qj  +fi  -at 

CVlTil  _  A   /         7T;t/j 


-  to  -  T2*-)" 


kj  -  a }    q,  -  kj  -  fi] 


KiVi 


a,  =  ~^±-  +  ft-  I  T^^TT  +  («i  -  ^-)< 


+ft(x- 


fcj  -  a,        "'//  V(^«  _Q<)2 

A    / ^ 

ft    \(ki  —  ati)(ki  —  cti  —  p) 


(Vi 


-TTiqufiidi-  (  ,,     '  [  „(*  -  -, -)  +  (vi 


k  -  an 


ft  \(ki  -  at)2  ki  -  a, 


A 

+  7Ttqit3ldt{x ) 


n-«y, 


+  H  ("-'/,.. ■■V'j 


//      \//(A*,  -  Cti)(ki  -  Q,  -  //) 


j/< 


fa 


A-,  —  Q,    Q,  —  A'ly 


fcj  -  Q(    /((a,  -  A,  -  fi) 


(44) 


;Aj  -  a,  )(o,  -  a,-)(fci  _  Qj)  fcj  _  aj    (a«  ~  fcj)(^  ~  fci; 


A 


Xiqjifydjix ) 


77,11, 


+  (»i- 


7T, 


2/j 


■): 


l 


//    \(&j  -  Qj)(Qi ;-Qj  -  /<)(*";  -  «j  -  /0  ^i-«j    (Ot-^i -/*)(* 


^  -  a'  : 

(45) 


APPENDIX  B 


The  approximate  value  function  Ve  is  given  by 

rT      I 


f  ECt^OO  +  ^OO)* 


L    77 ; —  +  (u«  "  7 )_ 


fc 


+  e 


Cw*«(l  -  e-'tr-*>)      C„(i-e-*iCr-t)) 


0,(^    -    "t; 


+ 


ftA/^^e-^1"-*)-!: 


( 


//    v       at(&i  -  at)2 


+  (Vi 


«iVi 


h  -  a,- 


1,2 


) 


A 


+Pi{x-  -) 


TTiJ/i 


(  -i  >,+u)(T-t)  _  y 


+  (», 


- //,      1  -  c-(^+"Hr-n 


//'  MA',  -Q,)(I',  -  Qj  -  fi){al ;  +  fi)  ki-Oi  Vih  +  V 


) 
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-Ki<4u:Jidi-\ 


KiVi       ,(otT+  l)c-a'{T-,]  -{alt  +  1)       l-e-a'<T-f) 


+ 


(j,\(ki-ai)2  of  Q,(kt-Qi) 

+(vi-—     -)-         -Tjj-   —r\ J 


k;  —  q. 


m<*i-h) 


\^         r|.y.(e-(^)(r-0_i)  w.y.  e-(^)(T-t)  _  ! 

+7Tigti^di(x )l  — — — —  +  [Vi  - — 1 

//    ^n(ki-ai){ki-ai- fi){ai  + fi)  k\  -  a,   //(q,-  -  kt  -  /.i)(kt  +  fi)/ 

5|Viftift"  ^  W*i  -  aj)(ai  -  «,-)(<*  -  kt)      [V>      k3  -  Qj> *,-(**  -  *,-)(*  -  k^ 

-mjiPjdj{x  — )(- — — — -r 

/'    M*j  -  Qj)(ai  -Qj-  fi){ki  -otj-  fi){Qj  +  /') 


+  (»i 


7T 


yj 


e-{kj+n)(T-t)  _  y 


kj  -  a3    {at  -  kj  -  fi)(ki  -  kj  -  fi)(k3  +  fi] 


))])■ 


(46) 
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